Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Nature ; 622(7981): 41-47, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37794265

ABSTRACT

Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.


Subject(s)
Genes , Genome, Human , Molecular Sequence Annotation , Protein Isoforms , Humans , Genome, Human/genetics , Molecular Sequence Annotation/standards , Molecular Sequence Annotation/trends , Protein Isoforms/genetics , Human Genome Project , Pseudogenes , RNA/genetics
2.
Cell Biol Toxicol ; 36(3): 261-272, 2020 06.
Article in English | MEDLINE | ID: mdl-31599373

ABSTRACT

In the advanced stages, malignant melanoma (MM) has a very poor prognosis. Due to tremendous efforts in cancer research over the last 10 years, and the introduction of novel therapies such as targeted therapies and immunomodulators, the rather dark horizon of the median survival has dramatically changed from under 1 year to several years. With the advent of proteomics, deep-mining studies can reach low-abundant expression levels. The complexity of the proteome, however, still surpasses the dynamic range capabilities of current analytical techniques. Consequently, many predicted protein products with potential biological functions have not yet been verified in experimental proteomic data. This category of 'missing proteins' (MP) is comprised of all proteins that have been predicted but are currently unverified. As part of the initiative launched in 2016 in the USA, the European Cancer Moonshot Center has performed numerous deep proteomics analyses on samples from MM patients. In this study, nine MPs were clearly identified by mass spectrometry in MM metastases. Some MPs significantly correlated with proteins that possess identical PFAM structural domains; and other MPs were significantly associated with cancer-related proteins. This is the first study to our knowledge, where unknown and novel proteins have been annotated in metastatic melanoma tumour tissue.


Subject(s)
Melanoma/genetics , Neoplasm Metastasis/genetics , Proteomics/methods , Adult , Biomarkers, Tumor/genetics , Female , Genome, Human/genetics , Humans , Male , Middle Aged , Molecular Sequence Annotation/methods , Molecular Sequence Annotation/trends , Prognosis , Proteome/genetics , Proteome/metabolism , Skin Neoplasms/genetics , Melanoma, Cutaneous Malignant
3.
Genome Biol ; 20(1): 244, 2019 11 19.
Article in English | MEDLINE | ID: mdl-31744546

ABSTRACT

BACKGROUND: The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS: Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION: We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.


Subject(s)
Molecular Sequence Annotation/trends , Animals , Biofilms , Candida albicans/genetics , Drosophila melanogaster/genetics , Genome, Bacterial , Genome, Fungal , Humans , Locomotion , Memory, Long-Term , Molecular Sequence Annotation/methods , Pseudomonas aeruginosa/genetics
4.
Gigascience ; 7(8)2018 08 01.
Article in English | MEDLINE | ID: mdl-30107399

ABSTRACT

Background: The Gene Ontology (GO) is one of the most widely used resources in molecular and cellular biology, largely through the use of "enrichment analysis." To facilitate informed use of GO, we present GOtrack (https://gotrack.msl.ubc.ca), which provides access to historical records and trends in the GO and GO annotations. Findings: GOtrack gives users access to gene- and term-level information on annotations for nine model organisms as well as an interactive tool that measures the stability of enrichment results over time for user-provided "hit lists" of genes. To document the effects of GO evolution on enrichment, we analyzed more than 2,500 published hit lists of human genes (most older than 9 years ); 53% of hit lists were considered to yield significantly stable enrichment results. Conclusions: Because stability is far from assured for any individual hit list, GOtrack can lead to more informed and cautious application of GO to genomics research.


Subject(s)
Gene Ontology/trends , Genomics/methods , Molecular Sequence Annotation/trends , Animals , Eukaryota/genetics , Humans
5.
Microb Biotechnol ; 11(4): 588-605, 2018 07.
Article in English | MEDLINE | ID: mdl-29806194

ABSTRACT

Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Molecular Sequence Annotation/trends , Bacteria/classification , Bacteria/isolation & purification , Big Data , Computational Biology , Databases, Genetic , Molecular Sequence Annotation/methods
7.
Arq. bras. med. vet. zootec ; 68(2): 489-496, mar.-abr. 2016. tab
Article in Portuguese | LILACS | ID: lil-779784

ABSTRACT

Objetivou-se com este estudo estimar parâmetros genéticos para produções parciais e acumuladas de ovos em uma linha fêmea de frangos de corte comercial. Foram considerados 10 períodos mensais entre 25 e 64 semanas, três períodos parciais de 25 a 32, 33 a 48 e 49 a 64 semanas, e três períodos acumulados de 25 até 30, 40 e 50 semanas de idade. Os componentes de covariância e parâmetros genéticos foram obtidos pelo método da máxima verossimilhança restrita, sob o modelo animal considerando o efeito fixo de incubação e os efeitos aleatórios genético aditivo e residual. As estimativas de herdabilidade variaram de 0,12 a 0,41. Evidenciou-se que os períodos anteriores e posteriores ao maior nível de produção apresentam maior variabilidade genética. As correlações genéticas entre os períodos de produção de ovos estudados variaram de -0,12 a 0,98. De modo geral, o padrão de variação foi semelhante entre as estratégias avaliadas, e todas foram geneticamente associadas com a produção total. Os resultados deste estudo mostraram que a melhoria da produção total é viável por meio de seleção de registros parciais. No entanto, caso se considere a eficiência relativa de seleção, o segundo mês e os períodos a partir da quadragésima semana de produção seriam os mais indicados.


The aim of this study was to estimate genetic parameters for partial and cumulative egg production in a commercial broiler female line. Ten monthly periods between 25 and 64 weeks, three partial periods of 25 to 32, 33 to 48 and 49 to 64 cumulative weeks and three periods of 25 to 30, 40 and 50 weeks of age and total egg production were considered. The restricted maximum likelihood method under the animal model was used to estimate the covariance components and genetic parameters. The fixed effect of incubation and the additive genetic and residual random effects were considered. The estimated heritability ranged from 0.12 to 0.41. These estimates showed that the anterior and posterior periods of the higher production have greater genetic variability. The genetic correlations between periods of the egg production studied ranged from -0.12 to 0.98. In general, the pattern of variation was similar between the strategies evaluated and all were genetically associated with the total egg production. The results of this study showed that the improvement of the total egg production is feasible by selection of partial records. However, considering the relative efficiency of selection, the second month and the periods from the fortieth week of production would be the most suitable.


Subject(s)
Animals , Poultry/anatomy & histology , Poultry/genetics , Eggs , Genetic Load , Chickens/genetics , Molecular Sequence Annotation/trends , Pedigree , Phenotype
8.
Nat Struct Mol Biol ; 22(1): 5-7, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25565026

ABSTRACT

Recent advances in RNA-sequencing technologies have led to the discovery of thousands of previously unannotated noncoding transcripts, including many long noncoding RNAs (lncRNAs) whose functions remain largely unknown. Here we discuss considerations and best practices in lncRNA identification and annotation, which we hope will foster functional and mechanistic exploration.


Subject(s)
Gene Expression Regulation , RNA, Untranslated/genetics , RNA, Untranslated/physiology , Molecular Biology/trends , Molecular Sequence Annotation/trends
10.
Methods ; 79-80: 32-40, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25308971

ABSTRACT

As high throughput methods, such as whole genome genotyping arrays, whole exome sequencing (WES) and whole genome sequencing (WGS), have detected huge amounts of genetic variants associated with human diseases, function annotation of these variants is an indispensable step in understanding disease etiology. Large-scale functional genomics projects, such as The ENCODE Project and Roadmap Epigenomics Project, provide genome-wide profiling of functional elements across different human cell types and tissues. With the urgent demands for identification of disease-causal variants, comprehensive and easy-to-use annotation tool is highly in demand. Here we review and discuss current progress and trend of the variant annotation field. Furthermore, we introduce a comprehensive web portal for annotating human genetic variants. We use gene-based features and the latest functional genomics datasets to annotate single nucleotide variation (SNVs) in human, at whole genome scale. We further apply several function prediction algorithms to annotate SNVs that might affect different biological processes, including transcriptional gene regulation, alternative splicing, post-transcriptional regulation, translation and post-translational modifications. The SNVrap web portal is freely available at http://jjwanglab.org/snvrap.


Subject(s)
Molecular Sequence Annotation/methods , Polymorphism, Single Nucleotide , Algorithms , Alternative Splicing , Gene Expression Regulation , Genetic Variation , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation/trends
11.
Proc Natl Acad Sci U S A ; 111(10): 3733-8, 2014 Mar 11.
Article in English | MEDLINE | ID: mdl-24567391

ABSTRACT

The exponential growth of protein sequence data provides an ever-expanding body of unannotated and misannotated proteins. The National Institutes of Health-supported Protein Structure Initiative and related worldwide structural genomics efforts facilitate functional annotation of proteins through structural characterization. Recently there have been profound changes in the taxonomic composition of sequence databases, which are effectively redefining the scope and contribution of these large-scale structure-based efforts. The faster-growing bacterial genomic entries have overtaken the eukaryotic entries over the last 5 y, but also have become more redundant. Despite the enormous increase in the number of sequences, the overall structural coverage of proteins--including proteins for which reliable homology models can be generated--on the residue level has increased from 30% to 40% over the last 10 y. Structural genomics efforts contributed ∼50% of this new structural coverage, despite determining only ∼10% of all new structures. Based on current trends, it is expected that ∼55% structural coverage (the level required for significant functional insight) will be achieved within 15 y, whereas without structural genomics efforts, realizing this goal will take approximately twice as long.


Subject(s)
Databases, Protein , Molecular Sequence Annotation/trends , Proteins/chemistry , Proteomics/trends , Computational Biology , Molecular Sequence Annotation/methods , Species Specificity
15.
Nat Rev Genet ; 12(10): 703-14, 2011 Sep 16.
Article in English | MEDLINE | ID: mdl-21921926

ABSTRACT

Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.


Subject(s)
Data Collection/trends , Haplotypes/genetics , Base Sequence , Computational Biology/methods , Computational Biology/trends , Data Collection/methods , Databases, Genetic/trends , Genome-Wide Association Study/methods , Genome-Wide Association Study/trends , Haplotypes/physiology , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/trends , Humans , Molecular Sequence Annotation/methods , Molecular Sequence Annotation/trends , Polymorphism, Single Nucleotide/physiology
16.
Nat Rev Genet ; 12(10): 671-82, 2011 Sep 07.
Article in English | MEDLINE | ID: mdl-21897427

ABSTRACT

Transcriptomics studies often rely on partial reference transcriptomes that fail to capture the full catalogue of transcripts and their variations. Recent advances in sequencing technologies and assembly algorithms have facilitated the reconstruction of the entire transcriptome by deep RNA sequencing (RNA-seq), even without a reference genome. However, transcriptome assembly from billions of RNA-seq reads, which are often very short, poses a significant informatics challenge. This Review summarizes the recent developments in transcriptome assembly approaches - reference-based, de novo and combined strategies - along with some perspectives on transcriptome assembly in the near future.


Subject(s)
Gene Expression Profiling/trends , Animals , Base Sequence , Cloning, Molecular , Gene Expression Profiling/methods , Gene Library , Humans , Models, Biological , Molecular Sequence Annotation/methods , Molecular Sequence Annotation/trends , Molecular Sequence Data , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/trends , Sequence Analysis, RNA/methods , Sequence Analysis, RNA/trends
17.
Curr Protein Pept Sci ; 12(6): 503-7, 2011 Sep.
Article in English | MEDLINE | ID: mdl-21787300

ABSTRACT

Evidence is accumulating that small open reading frames (sORF, <100 codons) play key roles in many important biological processes. Yet, they are generally ignored in gene annotation despite they are far more abundant than the genes with more than 100 codons. Here, we demonstrate that popular homolog search and codon-index techniques perform poorly for small genes relative to that for larger genes, while a method dedicated to sORF discovery has a similar level of accuracy as homology search. The result is largely due to the small dataset of experimentally verified sORF available for homology search and for training ab initio techniques. It highlights the urgent need for both experimental and computational studies in order to further advance the accuracy of sORF prediction.


Subject(s)
Codon/genetics , Computational Biology/methods , Molecular Sequence Annotation/methods , Open Reading Frames/genetics , Computational Biology/trends , Databases, Protein , Forecasting , Molecular Sequence Annotation/trends , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/genetics
19.
BMC Biol ; 8: 149, 2010 Dec 21.
Article in English | MEDLINE | ID: mdl-21176148

ABSTRACT

BACKGROUND: Discovery that the transcriptional output of the human genome is far more complex than predicted by the current set of protein-coding annotations and that most RNAs produced do not appear to encode proteins has transformed our understanding of genome complexity and suggests new paradigms of genome regulation. However, the fraction of all cellular RNA whose function we do not understand and the fraction of the genome that is utilized to produce that RNA remain controversial. This is not simply a bookkeeping issue because the degree to which this un-annotated transcription is present has important implications with respect to its biologic function and to the general architecture of genome regulation. For example, efforts to elucidate how non-coding RNAs (ncRNAs) regulate genome function will be compromised if that class of RNAs is dismissed as simply 'transcriptional noise'. RESULTS: We show that the relative mass of RNA whose function and/or structure we do not understand (the so called 'dark matter' RNAs), as a proportion of all non-ribosomal, non-mitochondrial human RNA (mt-RNA), can be greater than that of protein-encoding transcripts. This observation is obscured in studies that focus only on polyA-selected RNA, a method that enriches for protein coding RNAs and at the same time discards the vast majority of RNA prior to analysis. We further show the presence of a large number of very long, abundantly-transcribed regions (100's of kb) in intergenic space and further show that expression of these regions is associated with neoplastic transformation. These overlap some regions found previously in normal human embryonic tissues and raises an interesting hypothesis as to the function of these ncRNAs in both early development and neoplastic transformation. CONCLUSIONS: We conclude that 'dark matter' RNA can constitute the majority of non-ribosomal, non-mitochondrial-RNA and a significant fraction arises from numerous very long, intergenic transcribed regions that could be involved in neoplastic transformation.


Subject(s)
Genome, Human , Molecular Sequence Annotation/standards , RNA, Nuclear/genetics , Adolescent , Animals , Bone Neoplasms/genetics , Bone Neoplasms/metabolism , Bone Neoplasms/pathology , Brain/metabolism , Drosophila/genetics , Genome, Human/genetics , Genome, Insect , Humans , K562 Cells , Knowledge Bases , Liver/metabolism , Molecular Sequence Annotation/trends , Neoplasm Metastasis/genetics , RNA/genetics , RNA, Mitochondrial , RNA, Nuclear/metabolism , RNA, Ribosomal/genetics , Sarcoma, Ewing/genetics , Sarcoma, Ewing/metabolism , Sarcoma, Ewing/pathology , Sequence Analysis, RNA/standards
SELECTION OF CITATIONS
SEARCH DETAIL
...